Skip to content

prov/efa: Cleanup EP/CQ#12167

Open
alekswn wants to merge 4 commits intoofiwg:mainfrom
alekswn:cleanup-ep-cq
Open

prov/efa: Cleanup EP/CQ#12167
alekswn wants to merge 4 commits intoofiwg:mainfrom
alekswn:cleanup-ep-cq

Conversation

@alekswn
Copy link
Copy Markdown
Contributor

@alekswn alekswn commented Apr 21, 2026

  • remove unused efa_dgram_ep_msg_ops and efa_dgram_ep_rma_ops
  • move efa-direct-only fields from efa_base_ep to new efa_ep struct
  • add DGRAM unit test for construct_ibv_qp_init_attr_ex]

@alekswn alekswn changed the title Cleanup ep cq prov/efa: Cleanup EP/CQ Apr 21, 2026
@alekswn alekswn requested a review from a team April 21, 2026 23:01
@alekswn alekswn force-pushed the cleanup-ep-cq branch 2 times, most recently from 4a71eae to c70bce1 Compare April 24, 2026 21:56
@alekswn
Copy link
Copy Markdown
Contributor Author

alekswn commented Apr 25, 2026

bot:aws:retest

@alekswn alekswn force-pushed the cleanup-ep-cq branch 3 times, most recently from 3d8239c to 314b561 Compare April 28, 2026 01:06
alekswn added 4 commits April 28, 2026 10:42
Remove dead DGRAM-specific ops structs that are no longer referenced
after the endpoint unification.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
…ect_ep struct

Extract ope_pool and ope_list from efa_base_ep into a new efa_direct_ep
struct (defined in efa_direct_ep.h) that wraps efa_base_ep as its first
member for container_of castability.

This separates efa-direct (DGRAM) endpoint state from the shared base,
keeping efa_base_ep clean for both efa-direct and efa-protocol paths.

Guard container_of casts with EFA_INFO_TYPE_IS_RDM checks where the
base_ep_list contains mixed endpoint types.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
Add test_efa_base_ep_construct_ibv_qp_init_attr_ex_dgram to verify
QP initialization attributes for DGRAM endpoints, complementing the
existing RDM test.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
Add the new efa_direct_ep.h header to the list of distributed headers
so it is included in release tarballs.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
@alekswn alekswn marked this pull request as ready for review April 28, 2026 17:58
@shijin-aws
Copy link
Copy Markdown
Contributor

Generally, some parts are missing here. The goal of the refactor is to eliminate the fabric condition in the common efa_base_ep code. I am pretty sure there are some still there. like

if (EFA_INFO_TYPE_IS_DIRECT(base_ep->info)) {

struct efa_direct_ep {
struct efa_base_ep base_ep;

struct ofi_bufpool *ope_pool; /**< pool for efa_direct_ope */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a comment here that we make the ope_pool/ope_list additional for efa_direct_ep because of a need to track the outstanding operations that still reference MRs. This should be a temporary addition and finally removed or moved to the efa_base_ep when we have a unified approach to make it cover both efa and efa-direct

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more, I think we should keep ope_pool and ope_list in the base_ep and make efa_rdm_ep use them. efa_rdm_ep also have a ope_pool which is the same type. efa_rdm_ep has separate txe_list and rxe_list but they are only used for final cleanup purpose. There is no reason we cannot combine them into a single ope_list and scan it once and clean up based on the ope type accordingly

Comment thread prov/efa/Makefile.include
prov/efa/src/efa_cq.h \
prov/efa/src/efa_cntr.h \
prov/efa/src/efa_base_ep.h \
prov/efa/src/efa_direct_ep.h \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have been part of your first commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants